Intro to NGS processing

James A. Fellows Yates

2021-08-17

Who am I?

  • Education
    • B.Sc. Bioarchaeology (University of York, UK)
    • M.Sc. Naturwissenschaftliches Archäologie (University of Tübingen, DE)
    • Ph.D. Archaeogenetics (MPI-SHH / MPI-EVA, DE)
  • Experience
    • Number of genetics classes taken: 0
    • Number of bioinformatics classes taken: 0

@jfy133

Today we will

  1. Introduce what DNA sequencing is
  2. Explain how Illumina NGS sequencing data is generated
  3. How to evaluating NGS data [Practical]

Introduction DNA

What is DNA?

Deoxyribonucleic acid (/diːˈɒksɪˌraɪboʊnjuːˌkliːɪk, -ˌkleɪ-/ (DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. - Wikipedia

What is DNA?

Structure ADN

What is DNA?

Structure ADN

The rules

  • Four nucleotides
    • Pyrimidines: Cytosine, Thymine
    • Purines: Guanine Adenine &
  • Base pairing: one pyrimidine with one purine
    • C with G (think: CGI)
    • A with T (think: AT-AT walker)
  • Complementary
    • C on one strand, G on the other (or v.v.)
    • A on one strand, T on the other (or v.v.)

AT-AT Walker

The rules

  • Make copy of a DNA strand with a polymerase
    • Unwind the DNA
    • Separate the strands
    • Make new strand: find a C, get new G (etc)

DNA replication split

How do we get DNA?

Figure 17 01 02

Introduction to DNA Sequencing

What is Sequencing?

Converting the chemical nucleotides of a DNA molecule

to

ACTG on your computer screen

Historically

  • Sanger sequencing

Sanger-sequencing

  • Separate strands, add primer (starting point)
  • Add mix of nucleotides, some with special ‘terminators’
  • Pass through size-filtering, read order of terminators

Pros and cons of Sanger Sequencing

  • Pros
    • More precise (less errors)
    • Longer reads
  • Cons
    • Resource heavy: lot of input DNA
    • Slow: one. fragment. at. a. time.

What is NGS?

  • NGS: Next Generation Sequencing
    • MASSIVELY multiplexed!
    • Sequence millions and even billions of DNA reads at once!

Not really ‘next’ anymore, consider it more ‘second’ generation (see: Nanopore)

What is NGS?

Market leader:

Illumina HiSeq 2500

(Others: Roche 454, PacBio, IonTorrent etc.)

How does it work?

  • Basically same concept, but:
    • no size separation
    • with pretty pictures!

i.e. attach florescent nucleotides, (normally) one colour per base

A

G

T

C

Fire mah lazer, and take a picture! Rinse and repeat!

How does it work?

via Gfycat

Where does this happen?

On a ‘flow cell’

Next generation sequencing slide

Where does this happen?

But how do you get your DNA to attach to the lawn?

  • Convert it to library:
    • Add adapters
    • Add indexes
    • Add priming sites

AATGATACGGCGACCACCACaccgacaaCCCTACACGACGCTCTTCCGATCTXXXXXXAGCACACGTCTGAACTCCAGTCACgacactaCCGTCTTCTGCTTG ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| TTACTATGCCGCTGGTGGTGtggctgttGGGATGTGCTGCGAGAAGGCTAGAXXXXXXTCGTGTGCAGACTTGAGGTCAGTGctgtgatGGCAGAAGACGAAC

Sequencing-by-synthesis

Once attached, make lots of copies (clustering)

Sequencing-by-synthesis

Separate, add primer

Sequencing-by-synthesis

Add the florescent nucleotides, only complement will bind

Sequencing-by-synthesis

Fire the lazer, and take a photo

Rinse and repeat!

Improving quality

Throughout limits

Paired end

Paired end sequencing

Once end, bendover, attach other end (turnaround) and start from the end of the molecule

Cons of NGS sequencing

  • less accurate (laser/photo can get wrong)
  • chemistry limits (DNA strands gets old through heat cycling for denautring; dirty environment from suboptiomal wash steps etc.) mean short reads (compensated by volume)